636 research outputs found

    Discovering Regression Rules with Ant Colony Optimization

    Get PDF
    The majority of Ant Colony Optimization (ACO) algorithms for data mining have dealt with classification or clustering problems. Regression remains an unexplored research area to the best of our knowledge. This paper proposes a new ACO algorithm that generates regression rules for data mining applications. The new algorithm combines components from an existing deterministic (greedy) separate and conquer algorithm—employing the same quality metrics and continuous attribute processing techniques—allowing a comparison of the two. The new algorithm has been shown to decrease the relative root mean square error when compared to the greedy algorithm. Additionally a different approach to handling continuous attributes was investigated showing further improvements were possible

    Automating the Hunt for Volcanoes on Venus

    Get PDF
    Our long-term goal is to develop a trainable tool for locating patterns of interest in large image databases. Toward this goal we have developed a prototype system, based on classical filtering and statistical pattern recognition techniques, for automatically locating volcanoes in the Magellan SAR database of Venus. Training for the specific volcano-detection task is obtained by synthesizing feature templates (via normalization and principal components analysis) from a small number of examples provided by experts. Candidate regions identified by a focus of attention (FOA) algorithm are classified based on correlations with the feature templates. Preliminary tests show performance comparable to trained human observers

    Condition monitoring of an advanced gas-cooled nuclear reactor core

    Get PDF
    A critical component of an advanced gas-cooled reactor station is the graphite core. As a station ages, the graphite bricks that comprise the core can distort and may eventually crack. Since the core cannot be replaced, the core integrity ultimately determines the station life. Monitoring these distortions is usually restricted to the routine outages, which occur every few years, as this is the only time that the reactor core can be accessed by external sensing equipment. This paper presents a monitoring module based on model-based techniques using measurements obtained during the refuelling process. A fault detection and isolation filter based on unknown input observer techniques is developed. The role of this filter is to estimate the friction force produced by the interaction between the wall of the fuel channel and the fuel assembly supporting brushes. This allows an estimate to be made of the shape of the graphite bricks that comprise the core and, therefore, to monitor any distortion on them

    Using an Ant Colony Optimization Algorithm for Monotonic Regression Rule Discovery

    Get PDF
    Many data mining algorithms do not make use of existing domain knowledge when constructing their models. This can lead to model rejection as users may not trust models that behave contrary to their expectations. Semantic constraints provide a way to encapsulate this knowledge which can then be used to guide the construction of models. One of the most studied semantic constraints in the literature is monotonicity, however current monotonically-aware algorithms have focused on ordinal classification problems. This paper proposes an extension to an ACO-based regression algorithm in order to extract a list of monotonic regression rules. We compared the proposed algorithm against a greedy regression rule induction algorithm that preserves monotonic constraints and the well-known M5’ Rules. Our experiments using eight publicly available data sets show that the proposed algorithm successfully creates monotonic rules while maintaining predictive accuracy

    Learning Interestingness of Streaming Classification Rules

    Get PDF
    Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules. © Springer-Verlag 2004

    Detecting Large Concept Extensions for Conceptual Analysis

    Full text link
    When performing a conceptual analysis of a concept, philosophers are interested in all forms of expression of a concept in a text---be it direct or indirect, explicit or implicit. In this paper, we experiment with topic-based methods of automating the detection of concept expressions in order to facilitate philosophical conceptual analysis. We propose six methods based on LDA, and evaluate them on a new corpus of court decision that we had annotated by experts and non-experts. Our results indicate that these methods can yield important improvements over the keyword heuristic, which is often used as a concept detection heuristic in many contexts. While more work remains to be done, this indicates that detecting concepts through topics can serve as a general-purpose method for at least some forms of concept expression that are not captured using naive keyword approaches

    SkICAT: A cataloging and analysis tool for wide field imaging surveys

    Get PDF
    We describe an integrated system, SkICAT (Sky Image Cataloging and Analysis Tool), for the automated reduction and analysis of the Palomar Observatory-ST ScI Digitized Sky Survey. The Survey will consist of the complete digitization of the photographic Second Palomar Observatory Sky Survey (POSS-II) in three bands, comprising nearly three Terabytes of pixel data. SkICAT applies a combination of existing packages, including FOCAS for basic image detection and measurement and SAS for database management, as well as custom software, to the task of managing this wealth of data. One of the most novel aspects of the system is its method of object classification. Using state-of-theart machine learning classification techniques (GID3* and O-BTree), we have developed a powerful method for automatically distinguishing point sources from non-point sources and artifacts, achieving comparably accurate discrimination a full magnitude fainter than in previous Schmidt plate surveys. The learning algorithms produce decision trees for classification by examining instances of objects classified by eye on both plate and higher quality CCD data. The same techniques will be applied to perform higher-level object classification (e.g., of galaxy morphology) in the near future. Another key feature of the system is the facility to integrate the catalogs from multiple plates (and portions thereof) to construct a single catalog of uniform calibration and quality down to the faintest limits of the survey. SkICAT also provides a variety of data analysis and exploration tools for the scientific utilization of the resulting catalogs. We include initial results of applying this system to measure the counts and distribution of galaxies in two bands down to Bj is approximately 21 mag over an approximate 70 square degree multi-plate field from POSS-II. SkICAT is constructed in a modular and general fashion and should be readily adaptable to other large-scale imaging surveys

    A Mixed-Attribute Approach in Ant-Miner Classification Rule Discovery Algorithm

    Get PDF
    In this paper, we introduce Ant-MinerMA to tackle mixed-attribute classification problems. Most classification problems involve continuous, ordinal and categorical attributes. The majority of Ant Colony Optimization (ACO) classification algorithms have the limitation of being able to handle categorical attributes only, with few exceptions that use a discretisation procedure when handling continuous attributes either in a preprocessing stage or during the rule creation. Using a solution archive as a pheromone model, inspired by the ACO for mixed-variable optimization (ACO-MV), we eliminate the need for a discretisation procedure and attributes can be treated directly as continuous, ordinal, or categorical. We compared the proposed Ant-MinerMA against cAnt-Miner, an ACO-based classification algorithm that uses a discretisation procedure in the rule construction process. Our results show that Ant-MinerMA achieved significant improvements on computational time due to the elimination of the discretisation procedure without affecting the predictive performance

    Data Analytics for the Cryptocurrencies Behavior

    Get PDF
    The cryptocurrencies are a new paradigm of transferring money be-tween users. Their anonymous and non-centralized is a subject of debate around the globe that paired with the massive spikes and declines in value that are in-herit to an unregistered asset. These facts make difficult for the common daily use of the cryptocurrencies as an exchange currency as instead they are being used as a new way to invest. What we propose in this article is a system for the better understanding of the cryptocurrencies economical behavior against the global market. For that we are using Data Analytics techniques to build a pre-dictor that uses as inputs said external financial variable. These forecasts would help determine if a coin is safe to trade with, if those forecasts can be precise by only using this external data. The results obtained indicates us that there is a certain degree of influence of the global market to the cryptocurrencies, but that is it not enough to correctly predict the fluctuations in price of the coins and that they care more about others factors and that they have their own bubbles, like the crypto collapse in late 2017.Instituto de Investigación en Informátic

    The Art of Data Science

    Full text link
    To flourish in the new data-intensive environment of 21st century science, we need to evolve new skills. These can be expressed in terms of the systemized framework that formed the basis of mediaeval education - the trivium (logic, grammar, and rhetoric) and quadrivium (arithmetic, geometry, music, and astronomy). However, rather than focusing on number, data is the new keystone. We need to understand what rules it obeys, how it is symbolized and communicated and what its relationship to physical space and time is. In this paper, we will review this understanding in terms of the technologies and processes that it requires. We contend that, at least, an appreciation of all these aspects is crucial to enable us to extract scientific information and knowledge from the data sets which threaten to engulf and overwhelm us.Comment: 12 pages, invited talk at Astrostatistics and Data Mining in Large Astronomical Databases workshop, La Palma, Spain, 30 May - 3 June 2011, to appear in Springer Series on Astrostatistic
    • …
    corecore